A Comparative Study of Classification Based Personal E-mail Filtering

نویسندگان

  • Yanlei Diao
  • Hongjun Lu
  • Dekai Wu
چکیده

This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, Email messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, a comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linger – a Smart Personal Assistant for E-mail Classification

In this paper we present Linger a neural network based system for automated e-mail classification. Two scenarios are explored: filing e-mails into folders and spam e-mail filtering. Extensive experiments indicate that Linger compares favourably to other classification techniques. We study the effects of various feature selection, weighting and normalization methods, and also the portability of ...

متن کامل

ifile: An Application of Machine Learning to E-Mail Filtering

The rise of the World Wide Web and the ever-increasing amounts of machine-readable text has caused text classification to become a important aspect of machine learning. One specific application that has the potential to affect almost every user of the Internet is e-mail filtering. The WorldTalk Corporation estimates that over 60 million business people use e-mail [6]. Many more use e-mail purel...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

Evaluation of Comparativ Advantage in the Wood Products Industry in Iran

Economists believe that the development of the industry sector is for the benefit of other sectors of the economy. In 23 groups of industrial activity in classification ISIC, the production and export of wood industries have received less attention in developing countries. exports of wood products have a major impact on the GDP and provide employment and sustainable development of natural resou...

متن کامل

Evaluation of Comparativ Advantage in the Wood Products Industry in Iran

Economists believe that the development of the industry sector is for the benefit of other sectors of the economy. In 23 groups of industrial activity in classification ISIC, the production and export of wood industries have received less attention in developing countries. exports of wood products have a major impact on the GDP and provide employment and sustainable development of natural resou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000